Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Moving Cyphal/UDP to multicast #253

Merged
merged 170 commits into from Mar 12, 2023

Conversation

maksimdrachov
Copy link
Member

@maksimdrachov maksimdrachov commented Oct 9, 2022

This MR is based on this proposol discussed in the OpenCyphal forum.

In short, the changes can be broken down into three pieces:

1. Datagram header format

Current:

uint8 version = 0
uint8 priority
void16 
uint32 frame_index_eot
uint64 transfer_id
void64

Proposal:

uint8 version = 1   # UPDATE
uint8 priority
uint16 source_node_id   # NEW
uint32 frame_index_eot
uint64 transfer_id
void64

Note: version will be bumped to 1, however no backward-compatibility changes are made (since the protocol is still in development).

2. Message

Current:

    fixed         reserved
   (9 bits)       (3 bits)
   ________          _
  /        \        / \
  11101111.0ddddddd.000sssss.ssssssss
  \__/      \_____/    \____________/
(4 bits)    (7 bits)      (13 bits)
  IPv4      subnet-ID     subject-ID
multicast   \_______________________/
 prefix             (23 bits)
            collision-free multicast
               addressing limit of
              Ethernet MAC for IPv4

Proposal:

    fixed   message  reserved
   (9 bits) select.  (3 bits)
   ________   res.|  _
  /        \     vv / \
  11101111.0ddddd00.000sssss.ssssssss
  \__/      \___/      \____________/
(4 bits)   (5 bits)       (13 bits)
  IPv4     subnet-ID      subject-ID
multicast   \_______________________/
 prefix             (23 bits)
            collision-free multicast
               addressing limit of
              Ethernet MAC for IPv4

3. Service

Current: regular unicast

Proposal:

    fixed          service
   (9 bits)  res.  selector
   ________      ||
  /        \     vv
  11101111.0ddddd01.nnnnnnnn.nnnnnnnn
  \__/      \___/   \_______________/
(4 bits)   (5 bits)     (16 bits)
  IPv4     subnet-ID     node-ID
multicast   \_______________________/
 prefix             (23 bits)
            collision-free multicast
               addressing limit of
              Ethernet MAC for IPv4

TODO

  • Datagram
  • Message
  • Service
  • Implement Scott's changes
  • Add anonymous mode
  • Update documentation

@maksimdrachov
Copy link
Member Author

maksimdrachov commented Oct 9, 2022

Question: Is anonymous listening possible or not? There seems to be some contradiction:

pycyphal/transport/udp/__init__.py:20:

The concept of anonymous transfer is not defined for Cyphal/UDP

vs

pycyphal/transport/udp/__init__.py:302

>>> tr_1 = pycyphal.transport.udp.UDPTransport('127.9.15.254', local_node_id=None)  # Anonymous is only for listening.

@maksimdrachov
Copy link
Member Author

maksimdrachov commented Oct 9, 2022

Question: Does it make sense to move source_node_id to class Frame (defined in pycyphal/transport/commons/high_overhead_transport/_frame.py)?

@pavel-kirienko
Copy link
Member

Is anonymous listening possible or not? There seems to be some contradiction:

There is no contradiction: "The concept of anonymous transfer is not defined for Cyphal/UDP" means that you can't send anonymous transfers, but you can still create anonymous nodes that cannot communicate (they can only listen).

Does it make sense to move source_node_id to class Frame ( defined inpycyphal/transport/commons/high_overhead_transport/_frame.py)?

Probably not (yet) because you would have to update other transports dependent on this class.

@maksimdrachov
Copy link
Member Author

Ok, thanks.

"Datagram" still needs some work, will ping you when a review is needed.

@maksimdrachov
Copy link
Member Author

Does the passing of the local_node_id as source_node_id look right? (see last commit)

@maksimdrachov
Copy link
Member Author

I have updated the unit test, to take into account source_node_id

Functional tests (run pytest from pycyphal/transport/udp):

Screenshot 2022-10-15 at 15 21 56

Integration tests (pytest -k udp):

Screenshot 2022-10-15 at 15 21 29

Can you review these changes? I think the first step of implementing changes related to "Datagram" is done.

@pavel-kirienko
Copy link
Member

The changes look good to me. Later on, we may want to enable anonymous transfers for UDP as I wrote on the forum; it should be a cheap change to introduce. The relevant part of the source code can be found by searching for "In Cyphal/UDP, the anonymous mode is somewhat bolted-on."

The text and diagrams in the module docstring at _udp.py will also need to be updated, perhaps later.

Nice work so far ;)

@maksimdrachov
Copy link
Member Author

Later on, we may want to enable anonymous transfers for UDP as I wrote on the forum; it should be a cheap change to introduce.

The text and diagrams in the module docstring at _udp.py will also need to be updated, perhaps later.

Added to todo-list 👍

@maksimdrachov
Copy link
Member Author

maksimdrachov commented Oct 30, 2022

@pavel-kirienko

Could you review these changes? I just want to make sure that this part is correct (pycyphal/transport/udp/_ip) before I start updating the larger code base/unit tests.

Also some questions:

  1. service_data_specifier_to_multicast_group and message_data_specifier_to_multicast_group are pretty similar; so just combine them right? Instead of 4 functions in _endpoint_mapping.py, this would result in 2. (Equating MESSAGE_ID and SERVICE_ID into one variable NODE_ID would also help in this regard.)

  2. I renamed most of the constants used in _endpoint_mapping.py, let me know if this change make sense.

@pavel-kirienko
Copy link
Member

Nice progress but there seems to be a problem with subject-/service-/node-ID mixup. Please give another look at section 4.1.1 "Transport model" of the Specification.


To publish a message on subject S, we send a multicast datagram to the multicast group whose address is computed as:

image

And the destination UDP port is set to 16383.


To send a request or response on service X to node N, we send a multicast datagram to the multicast group whose address is computed as:

image

And the destination UDP port is set to (16384 + X * 2 + (is_response)).

@pavel-kirienko
Copy link
Member

Could you please update your branch to sync up with master?

@maksimdrachov maksimdrachov force-pushed the multicast-port branch 4 times, most recently from 6044614 to 829c071 Compare November 1, 2022 13:57
@maksimdrachov
Copy link
Member Author

Could you confirm the changes are correct? I think I have addressed the issues.

@maksimdrachov
Copy link
Member Author

maksimdrachov commented Nov 4, 2022

Regarding pycyphal/transport/udp/_ip/_v4.py:

class IPv4SocketFactory(SocketFactory):

  def __init__(self, domain_id: int)


  def make_output_socket(
    self, remote_node_id: typing.Optional[int], data_specifier: pycyphal.transport.DataSpecifier
  ) -> socket.socket:
    # General setup
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
    s.setblocking(False)
    s.bind((str(self._local), 0))  # QUESTION: What local IP address bind to? (Does it need to be bound at all?)

    # Message
    remote_ip = message_data_specifier_to_multicast_group(self._domain_id, data_specifier)
    remote_port = SUBJECT_PORT

    # Service
    remote_ip = service_data_specifier_to_multicast_group(self._domain_id, remote_node_id, data_specifier)
    remote_port = service_data_specifier_to_udp_port(data_specifier)

    # Connect
    s.connect((str(remote_ip), remote_port))


  def make_input_socket(
    self, remote_node_id: typing.Optional[int], data_specifier: pycyphal.transport.DataSpecifier # CHANGE: need remote_node_id for service
    ) -> socket.socket:
    # General setup
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
    s.setblocking(False)

    # Message
    multicast_ip = message_data_specifier_to_multicast_group(self._domain_id, data_specifier)
    multicast_port = SUBJECT_PORT

    # Service
    multicast_ip = service_data_specifier_to_multicast_group(self._domain_id_, remote_node_id, data_specifier)
    multicast_port = service_data_specifier_to_udp_port(data_specifier)

    # Bind
    s.bind((str(multicast_ip), multicast_port))

Look right?

@maksimdrachov
Copy link
Member Author

@pavel-kirienko

I think it's almost finished.

The last change that needs to be addressed, is mainly related to udp/_socket_reader.py.

class SocketReader contains the following note:

This class is the solution to the UDP demultiplexing problem.
The objective is to read data from the supplied socket, parse it, and then forward it to interested listeners.

Why can't we ask the operating system to do this for us? Because there is no portable way of doing this(except for multicast sockets). Even on GNU/Linux, there is a risk of race conditions, but I'll spare you the details. Those who care may read this: https://stackoverflow.com/a/54156768/1007777.

This seems to suggest that some significant overhaul/simplification can be done here. Can you give some pointers?

Some smaller questions:

  1. pycyphal/transport/udp/_tracer.py:160:
if ip_destination.is_multicast:
    if udp_packet.destination_port == SUBJECT_PORT:
        # Message packet
        dst_nid = None  # Broadcast
        data_spec = multicast_group_to_message_data_specifier(ip_source, ip_destination)
    else:
        # Service packet
        data_spec = udp_port_to_service_data_specifier(udp_packet.destination_port)
        # QUESTION: Correct to use DOMAIN_ID_MASK here? (or make a seperate function for this?)
        domain_id = (int(ip_destination)&DOMAIN_ID_MASK)>>18
        dst_nid = service_multicast_group_to_node_id(domain_id, ip_destination)
  1. tests/transport/udp/ip/v4.py:53/81:
msg_i = fac.make_input_socket(None, MessageDataSpecifier(612))
test_msg_o.sendto(b"Seagull", ("239.52.2.100", SUBJECT_PORT))
time.sleep(1) ##QUESTION: BlockingIOError: [Errno 35] Resource temporarily unavailable
rx = msg_i.recvfrom(1024)
assert rx[0] == b"Seagull"
assert rx[1][0] == "127.0.0.1"  # Same address we just bound to.

This extra sleep(1) wasn't necessary before, however after some changes it started failing with the above error. Any remarks on this?

@pavel-kirienko
Copy link
Member

This seems to suggest that some significant overhaul/simplification can be done here. Can you give some pointers?

Yes, indeed, as you have correctly guessed, it should be possible to get rid of the SocketReader or at least simplify it. There may be an arbitrary number of sockets connected to the same multicast endpoint and the OS should perform demultiplexing correctly for you. Maybe for now, in the interest of minimizing the scope of this changeset, we should keep the socket reader in place and consider removing it later.

QUESTION: Correct to use DOMAIN_ID_MASK here? (or make a seperate function for this?)

I am not sure what are you going to use the domain-ID here for? It doesn't seem to be needed to parse the frame unless I am missing something.

This extra sleep(1) wasn't necessary before, however after some changes it started failing with the above error. Any remarks on this?

The kernel works in mysterious ways, what else can I say? Since this is just a test, it is fine to simply keep the sleep in place.

@pavel-kirienko
Copy link
Member

There is some discussion on the forum that you should be aware of: https://forum.opencyphal.org/t/cyphal-udp-architectural-issues-caused-by-the-dependency-between-the-nodes-ip-address-and-its-identity/1765/41

There's nothing major but it seems like we'll have to shuffle some bits around the header and the IP address, start using one common UDP port number for all traffic and discriminate services based on a dedicated service-ID field in the header instead of UDP ports, and also possibly add a header checksum. All of these changes seem quite minor in comparison to what you've already implemented here.

@maksimdrachov
Copy link
Member Author

maksimdrachov commented Nov 20, 2022

@pavel-kirienko

First I re-wrote pycyphal/transport/udp/_socket_reader.py like this:

def _dispatch_frame(
    self, timestamp: Timestamp, source_ip_address: _IPAddress, frame: typing.Optional[UDPFrame]
) -> None:
    # Do not accept datagrams emitted by the local node itself. Do not update the statistics either.
    external = self._anonymous or (source_ip_address != self._local_ip_address)
    if not external:
        return

    # Process the datagram. This is where the actual demultiplexing takes place.
    # The node-ID mapper will return None for datagrams coming from outside of our Cyphal subnet.
    handled = False
    source_node_id = None
    if frame is not None:
        # if source_ip_address is part of our Cyphal subnet
        if (DOMAIN_ID_MASK & int(source_ip_address)) == (DOMAIN_ID_MASK & int(self._local_ip_address)):
            source_node_id = frame.source_node_id
        # if source_ip_address is not part of our Cyphal subnet, source_node_id is None
        else:
            source_node_id = None

Now I'm starting to suspect that this is not how it's meant to be. Instead it should be:

  • class SocketReader needs a new variable domain_id
def __init__(
    self,
    sock: socket.socket,
    local_ip_address: _IPAddress,
    domain_id: int,
    anonymous: bool,
    statistics: SocketReaderStatistics,
):
  self._domain_id = domain_id
  • def _dispatch_frame becomes:
def _dispatch_frame(
    self, timestamp: Timestamp, source_ip_address: _IPAddress, frame: typing.Optional[UDPFrame]
) -> None:
    # Do not accept datagrams emitted by the local node itself. Do not update the statistics either.
    external = self._anonymous or (source_ip_address != self._local_ip_address)
    if not external:
        return

    # Process the datagram. This is where the actual demultiplexing takes place.
    # The node-ID mapper will return None for datagrams coming from outside of our Cyphal subnet.
    handled = False
    source_node_id = None
    if frame is not None:
        # if source_ip_address is part of our Cyphal subnet
        if self._domain_id == (DOMAIN_ID_MASK & int(source_ip_address)):
            source_node_id = frame.source_node_id
        # if source_ip_address is not part of our Cyphal subnet, source_node_id is None
        else:
            source_node_id = None

I'm not sure if source_ip_address is the multicast address here?

Note to self: replace subnet with domain_id, to avoid further confusion.

@pavel-kirienko
Copy link
Member

Now I'm starting to suspect that this is not how it's meant to be

You are correct in suspecting this!

I'm not sure if source_ip_address is the multicast address here?

In this new design, unicast IP addresses are no longer relevant at all. Any node can operate on any domain-ID with any node-ID regardless of its identity on the IP layer. Parameters like the source_ip_address should no longer be used anywhere except the socket factory (where it is needed only to initialize the socket correctly, this is done once per socket). The socket reader now only needs to read the datagram and pass it along regardless of the source address or the local IP address.

@maksimdrachov
Copy link
Member Author

maksimdrachov commented Nov 23, 2022

@pavel-kirienko

Can you check this _socket_reader implementation?

Main changes:

  • Removed self._local_ip_address
  • Removed self._anonymous
  • Changed SocketReaderStatistics
# Old
accepted_datagrams: typing.Dict[int, int] = dataclasses.field(default_factory=dict)
dropped_datagrams: typing.Dict[typing.Union[_IPAddress, int], int] = dataclasses.field(default_factory=dict)

# New
accepted_datagrams: typing.Dict[typing.Optional[int], int] = dataclasses.field(default_factory=dict)
dropped_datagrams: typing.Dict[typing.Optional[int], int] = dataclasses.field(default_factory=dict)

# Keys:
# None: Invalid node-ID (for dropped_datagrams)
# None: anonymous frame (for accepted_datagrams)
# Int: node-ID

Concerning the keys: use None for anonymous frames? (currently 0xffff but easy fix)

(Ignore the QUESTIONs in the code.)

Unit tests _unittest_socket_readerand _unittest_socket_reader_endpoint_reuse are running successfully, but need some additional changes to better reflect the new class structure.

pycyphal/transport/serial/_frame.py Outdated Show resolved Hide resolved
pycyphal/transport/serial/_frame.py Outdated Show resolved Hide resolved
pycyphal/transport/udp/_frame.py Outdated Show resolved Hide resolved
pycyphal/transport/udp/_frame.py Outdated Show resolved Hide resolved
pycyphal/transport/udp/_frame.py Outdated Show resolved Hide resolved
pycyphal/transport/udp/_session/_output.py Outdated Show resolved Hide resolved
pycyphal/transport/udp/_udp.py Outdated Show resolved Hide resolved
pycyphal/transport/udp/_udp.py Outdated Show resolved Hide resolved
tests/transport/redundant/_redundant.py Outdated Show resolved Hide resolved
tests/transport/udp/ip/link_layer.py Show resolved Hide resolved
maksimdrachov and others added 10 commits March 12, 2023 08:54
Co-authored-by: Pavel Kirienko <pavel.kirienko@gmail.com>
Co-authored-by: Pavel Kirienko <pavel.kirienko@gmail.com>
Co-authored-by: Pavel Kirienko <pavel.kirienko@gmail.com>
Co-authored-by: Pavel Kirienko <pavel.kirienko@gmail.com>
Co-authored-by: Pavel Kirienko <pavel.kirienko@gmail.com>
Co-authored-by: Pavel Kirienko <pavel.kirienko@gmail.com>
Co-authored-by: Pavel Kirienko <pavel.kirienko@gmail.com>
Co-authored-by: Pavel Kirienko <pavel.kirienko@gmail.com>
Co-authored-by: Pavel Kirienko <pavel.kirienko@gmail.com>
Co-authored-by: Pavel Kirienko <pavel.kirienko@gmail.com>
@pavel-kirienko
Copy link
Member

image

This is easy to fix: make UDPTransport accept IPAddress|str; if the argument is a string, use ipaddress.ip_address(str) to construct IPAddress.

Aside from that (and a few related type errors), the only remaining issue is that of the statistics. #279

@pavel-kirienko
Copy link
Member

Please also bump the minor version number and add a new section to the changelog.

import copy
@pavel-kirienko pavel-kirienko merged commit adc7e92 into OpenCyphal:master Mar 12, 2023
@maksimdrachov maksimdrachov deleted the multicast-port branch March 22, 2023 09:43
pavel-kirienko added a commit that referenced this pull request Apr 13, 2023
- Fix #288.
- Update MyPy, PyLint, PyTest, and Coverage to the latest versions.
- Address one TODO comment left over from #253.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants